hierarchical clustering and dendrogram
Hierarchical Clustering and Dendrograms in R for Data Science:
In the early stages of performing data analysis, an important aspect is to get a high level understanding of the multi-dimensional data and find some sort of pattern between the different variables- this is where clustering comes in. This blogpost will focus upon Agglomerative Hierarchical Clustering, its applications and a practical example in R. By now, two questions should arise in your mind. 1) When we say we group the two closest nodes together, how do we define close? And 2) What will be the merging approach to group them? Let's start with a small dataset and understand how Dendrograms are formed in RStudio: I have used normal distribution to compute both x and y coordinates for our dataset and also numbered the datapoints for our understanding. First, we store our x and y datasets as x- and y-coordinates of a dataframe.